Download data, Disposable income for households by region, type of
households and age, for three different group ages:
18-29
30-49
50-64
For year 2016.
We read data of swedish counties from the map into json.
We also change the data Disposable income for households by region, type
of households and age on the way that age groups are shown in different
columns named data.
18-29: Young
30-49: Adult
50-64: Senior
The head of processed data:
## Warning: package 'plotly' was built under R version 4.4.2
## Loading required package: ggplot2
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
## region Young Adult Senior
## 1 Stockholm 418.9 716.4 742.5
## 2 Uppsala 327.6 589.9 631.1
## 3 Södermanland 346.0 532.0 551.6
## 4 Östergötland 316.0 546.4 579.4
## 5 Jönköping 359.6 563.7 605.0
## 6 Kronoberg 334.4 546.9 577.2
Create a plot in Plotly containing three violin plots showing mean income distributions per age group.
Here we plot two different data :Data and Processed data
We can see difference in the xtitle of the maps.
We can see the big difference between the income range of young
compare to adult and senior.
It seems that the income increase between the level young , adult and
senior. But the differences for income of young people with others is a
lot.
The median for young people is 334, for adult is 531 and for Senior is
551.
Also the range of income change in young employee is less than others so
the income differences when people are adult or senior is more
possible.
plot in Plotly showing dependence of Senior incomes on Adult and Young incomes in various counties.
The surface plot:
##
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
##
## select
## The following object is masked from 'package:plotly':
##
## select
## Warning: package 'akima' was built under R version 4.4.2
Here we can see Positive Trend Senior income increase as young and
adult increase.
Linear regression seems to be reasonable as the surface follow a
linear pattern.
First we print the json data properties and we see that counties is
in properties.NAME_1.
scope = europe
## $GID_1
## [1] "SWE.2_1"
##
## $GID_0
## [1] "SWE"
##
## $COUNTRY
## [1] "Sweden"
##
## $NAME_1
## [1] "Dalarna"
##
## $VARNAME_1
## [1] "Dalecarlia|Kopparberg"
##
## $NL_NAME_1
## [1] "NA"
##
## $TYPE_1
## [1] "Län"
##
## $ENGTYPE_1
## [1] "County"
##
## $CC_1
## [1] "NA"
##
## $HASC_1
## [1] "SE.KO"
##
## $ISO_1
## [1] "NA"
By looking at the map we can find the cluster of regions that
are similar (which is not seen in previous statistical plot).
We can see that the mean of income in North of Sweden is less than
south.
Find unusual regions (compared to neighbor regions) and
patterns: Here, we see Stockholm is the best place to work for
all two age groups. Added to that for young people is not a bad choice
to o to the north for working, but for Adult there is no big difference
between the north and middle and the best choice is south of Sweden.
Add a red dot to the choropleth map for Young from step 4 in order to
show where we are located:
We find latitude,longitude of Linkoping the the site, http://www.gpsvisualizer.com/geocoder/ :
latitude,longitude,name,desc,color,source,precision
58.41109,15.62565,Linkoping,“Linkoping, Linkoping,
E
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
As we can see, regarding to the dataset the average income for Young people is in Östergötland is in lowest place compare to other part of Sweden.
knitr::opts_chunk$set(echo = TRUE)
# Assignment 2
#Part1 :
#Download map
library(rjson)
library(plotly)
library(dplyr)
library(ggplot2)
#library(MASS)
library(readr)
library(plotly)
json<-fromJSON(file="gadm41_SWE_1.json")
df<-read.csv("000006SW_20230919-130923.csv", fileEncoding="latin1")
#Read your data into R and process it in such a way
#that different age groups are shown in
#different columns. Let’s call these groups Young,
#Adult and Senior
data <- df %>%
filter(age %in% c("18-29 years"))%>%
dplyr::select(region,X2016)%>%
dplyr::rename(Young = X2016)
data<-mutate(data,df %>%
filter(age %in% c("30-49 years"))%>%dplyr::select( X2016)%>%
dplyr::rename(Adult = X2016))%>%mutate(data,df %>%
filter(age %in% c("50-64 years"))%>%dplyr::select( X2016)%>%
dplyr::rename(Senior = X2016))
head(data)
#Part2
#Create a plot in Plotly containing three violin plots
#showing mean income
#plot_ly by default data
p3<-plot_ly(df, x=~factor(age), y=~X2016, split=~factor(age),
type="violin", box=list(visible=T))%>% layout(yaxis=list(title="Mean income distributions"),
xaxis=list(title="Age Group"))
p3
#plot_ly by Processed data
p4<-plot_ly(data, type = "violin", box = list(visible = TRUE)) %>%
add_trace(y = ~Young, name = "Young") %>%
add_trace(y = ~Adult, name = "Adult") %>%
add_trace(y = ~Senior, name = "Senior") %>%
layout(title = "Mean Income Distribution by processed data",yaxis=list(title="Mean income distributions"),
xaxis=list(title="Age Group"))
p4
#part3
library(dplyr)
library(ggplot2)
library(MASS)
library(plotly)
#data %>%plot_ly(x=~Young, y=~Adult, z=~Senior, type="scatter3d")
#data %>%plot_ly(x=~Young, y=~Adult, z=~Senior, type="contour")
library(akima)
attach(data)
s=interp(Young,Adult,Senior, duplicate = "mean")
detach(data)
plot_ly(x=~s$x, y=~s$y, z=~s$z, type="surface") %>%
layout(scene = list(
xaxis = list(title = "Young Income"),
yaxis = list(title = "Adult Income"),
zaxis = list(title = "Senior Income")
),
title = "Dependence of Senior Incomes on Adult and Young Incomes by Region")
#part4
print(json$features[[2]]$properties)
#plot
g <- list(fitbounds = "location", visible = FALSE,scope = 'europe')
p1 <- plot_geo(data) %>% add_trace(
type = "choropleth",
geojson = json,
locations = ~region,
z = ~Young,
featureidkey = "properties.NAME_1") %>%
layout(geo = g, title = "Young Incomes by Region")
p1
p2 <- plot_geo(data) %>% add_trace(
type = "choropleth",
geojson = json,
locations = ~region,
z = ~Adult,
featureidkey = "properties.NAME_1") %>%
layout(geo = g, title = "Adult Incomes by Region")
p2
##ccoordinates of linkoping
#latitude,longitude,name,desc,color,source,precision
#58.41109,15.62565,Linkoping,"Linkoping, Linkoping, E, SE",,MapQuest,city/town
lat=58.41109
long=15.62565
p1 <- plot_geo(data) %>% add_trace(
type = "choropleth",
geojson = json,
locations = ~region,
z = ~Young,
featureidkey = "properties.NAME_1") %>%
layout(geo = g,title = "Adult Incomes by Region, Red dot is Linkoping ") %>%
add_markers(
x = ~long,
y = ~lat,
color = "red",
size=2
)
p1